Background given in the case description: “The course lasts twelve weeks. Throughout the course, students are assessed in multiple ways, including weekly quizzes, slide exams, and essays. They also take an end of course exam that includes essay, short answer, and multiple-choice components. The final data has the average scores for those assessments. Students are required to take laboratory practical (gross anatomy, histology, pathology and neuroanatomy) exams which are averaged into the final grade. Students also take a National Board of Medical Examiners (NBME) standardized exam in each course. Theoretically, if they do well on these exams, they should do well in the course overall. All of the assessments have been calculated on a 100- point scale.”
1.1 Questions from Learning Objective 4
How should we define not pass/ marginal pass/ pass thresholds and criteria?
How do these thresholds compare to final exam scores?
1.2 Data
There are 92 students. 2 students scored below 70 on the final exam which is grounds for an immediate failing threshold. 3 students scored between a 70 & 80 on the final which could be considered students for further scrutiny.
Code
label(dt$quiz) <-"Quiz score (mean weekly performance)"label(dt$nbme) <-"National Board of Medical Examiners score"label(dt$ga) <-"Gross anatomy (mean score)"label(dt$slide) <-"Slide exams score (mean)"label(dt$part.c) <-"Part C score"label(dt$essay) <-"Essay score (mean)"label(dt$eob.exam) <-"End of Block (course term) exam"label(dt$final) <-"Final score"tableby(~quiz + nbme + ga + slide + part.c + essay+ eob.exam + final , data=dt, topclass="Rtable1-zebra",control =tableby.control(numeric.stats =c("meansd","median","range"))) %>%summary()%>%kable()
Warning in tableby(~quiz + nbme + ga + slide + part.c + essay + eob.exam + : unused arguments: topclass
Overall (N=92)
Quiz score (mean weekly performance)
Mean (SD)
0.821 (0.069)
Median
0.820
Range
0.660 - 1.000
National Board of Medical Examiners score
Mean (SD)
89.870 (5.452)
Median
91.000
Range
74.000 - 100.000
Gross anatomy (mean score)
Mean (SD)
82.974 (9.887)
Median
83.929
Range
49.490 - 100.000
Slide exams score (mean)
Mean (SD)
82.252 (10.005)
Median
83.925
Range
53.060 - 100.000
Part C score
Mean (SD)
81.112 (8.700)
Median
81.590
Range
59.630 - 100.000
Essay score (mean)
Mean (SD)
86.767 (5.418)
Median
87.250
Range
71.250 - 95.750
End of Block (course term) exam
Mean (SD)
84.913 (6.833)
Median
85.000
Range
65.000 - 99.000
Final score
Mean (SD)
88.457 (5.628)
Median
88.500
Range
68.000 - 100.000
Notes:
There is no missingness in the dataset.
Part C score is “like a catch-all exam if the knowledge can’t be obtained through their lab and essay assessments.”
Not included in our data (but included in the student evaluation) is the score for the laboratory practical which “has multiple assessment scores which are captured in the data such as the histology, pathology, etc. - which are not specifically named like that.”
We will disregard this for our purposes
1.2.1 Scores based on stratifying by passing the final exam at 70% threshold
Below is a pairs plot where students are divided into groups depending on whether they passed or if they scored below 80% which we called “almost fail”. Like mentioned above, these students deserve more scrutiny - how did they perform on other assessments?
Code
set.seed(123123)pc =prcomp(dt %>%select(-id,-final) %>%mutate_all(scale))#create pairs plotggpairs(dt %>%select(-id),aes(color=ifelse(final>80,"pass","(almost)fail\n")),progress = F) +labs(caption ="Stratified by final scores: (almost)fail is <=80 and pass is >80")
There appears to be pretty distinct separation in the distribution in the performance of students who scored less than 80% on the final vs. students who exceeded that threshold. Looking at the rightmost column, there is strong correlation (>0.702) between scores on the final exam and other metrics.
It is likely wise to stick with precedent of 70% being a pass threshold, since the distributions of all other evaluations are even more disparate as shown by the density plots below:
However, there isn’t as much separation for GA - perhaps one student did relatively better in Gross Anatomy, but their other scores during and outside of the course are much below their classmates.
If we stratify by who got a 90 on the final, for example, there is much more overlap for the distributions - so we aren’t separating out who is not performing well enough overall with this stringent of a cutoff.
We did further analysis with K means clustering with 4 groups(A,B,C,F) in an effort to identify particular groups of students classified based on their scores as an additional way to justify a separation threshold for failing. But as we can see, we failed to identify any useful subgroup in the data. But we can see that there are potential outliers in the student in the lower performance, we just need an evaluation method that help us capture these students with low performance
What if we take a weighted average to calculate the overall score where 0.4 comes from the NBME exam,since it is independent from the instructor’s coursework but correlated with a student success, and 0.6 is a combination of:
quiz scores
gross anatomy
slide quiz scores
Part C
essay scores
the end of block exam
and set the students in the lowest 5% quantile to fail?
Not like the original design which has a clear cut-off, this metric however can better reflect student overall performance in all fields, those who has a lower score tends to perform worse in most of other fields
Source Code
---title: "Case 1 Learning Objective 4"author: "Lisa Levoir and Jeffrey Zhuohui Liang"date: "`r format(Sys.time(), '%B %d, %Y')`"format: html: theme: yeti code-fold: true code-tools: true html-math-method: katex toc: true toc-depth: 3 fig-width: 13 fig-height: 10 toc-title: "Contents" number-sections: true self-contained: true self-contained-math: true smooth-scroll: true fontsize: 0.8em title-block-banner: true citation-location: margineditor: visual---```{r setup}#| echo: false#| message: false#| warning: false#| include: false#load libraries (more than I need but nice ot have)library(tidyverse)library(knitr)#library(table1) #Create HTML Tables of Descriptive Statistics https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html#library(OMTM1) #https://github.com/schildjs/OMTM1/library(Hmisc)library(viridis) #colorslibrary(tidyverse)library(readxl)library(corrplot)library(arsenal)library(GGally)library(ggthemes)library(ggfortify)library(plotly)library(dplyr)library(tidyr)library(cowplot) #allows me to use plotgridtheme_set(ggthemes::theme_calc())scale_color_discrete =scale_color_calc()#setwd("/Users/lisalevoir/BIOS7351_Collab/github/BIOS_Collaboration") #this line used to work until I moved this qmd file to my github folder (I need to run this in the console when I switch projects)#knitr::opts_knit$set(root.dir = "/Users/lisalevoir/BIOS7351_Collab/github/BIOS_Collaboration/case1") #this is a global option for knittingdt <- readxl::read_xlsx("./Case1.xlsx")```# Analyzing medical students scoresBackground given in the case description: "The course lasts twelve weeks. Throughout the course, students are assessed in multiple ways, including weekly quizzes, slide exams, and essays. They also take an end of course exam that includes essay, short answer, and multiple-choice components. The final data has the average scores for those assessments. Students are required to take laboratory practical (gross anatomy, histology, pathology and neuroanatomy) exams which are averaged into the final grade. Students also take a National Board of Medical Examiners (NBME) standardized exam in each course. Theoretically, if they do well on these exams, they should do well in the course overall. All of the assessments have been calculated on a 100- point scale."## Questions from Learning Objective 4- How should we define not pass/ marginal pass/ pass thresholds and criteria?- How do these thresholds compare to final exam scores?## DataThere are `r length(unique(dt$id))` students. `r sum(dt$final <= 70)` students scored below 70 on the final exam which is grounds for an immediate failing threshold. `r sum(dt$final <= 80) - sum(dt$final <=70)` students scored between a 70 & 80 on the final which could be considered students for further scrutiny.```{r}label(dt$quiz) <-"Quiz score (mean weekly performance)"label(dt$nbme) <-"National Board of Medical Examiners score"label(dt$ga) <-"Gross anatomy (mean score)"label(dt$slide) <-"Slide exams score (mean)"label(dt$part.c) <-"Part C score"label(dt$essay) <-"Essay score (mean)"label(dt$eob.exam) <-"End of Block (course term) exam"label(dt$final) <-"Final score"tableby(~quiz + nbme + ga + slide + part.c + essay+ eob.exam + final , data=dt, topclass="Rtable1-zebra",control =tableby.control(numeric.stats =c("meansd","median","range"))) %>%summary()%>%kable()```Notes:- There is no missingness in the dataset.- Part C score is "like a catch-all exam if the knowledge can't be obtained through their lab and essay assessments."- Not included in our data (but included in the student evaluation) is the score for the laboratory practical which "has multiple assessment scores which are captured in the data such as the histology, pathology, etc. - which are not specifically named like that." - We will disregard this for our purposes### Scores based on stratifying by passing the final exam at 70% threshold```{r}dt = dt %>%mutate(quiz =100*quiz)tableby(pass~.,dt %>%select(-id) %>%mutate(pass = final>70),control =tableby.control(numeric.stats =c("meansd","median","range"),digits=1 )) %>%summary() %>% knitr::kable()```Below is a pairs plot where students are divided into groups depending on whether they passed or if they scored below 80% which we called "almost fail". Like mentioned above, these students deserve more scrutiny - how did they perform on other assessments?```{r}#| fig-width: 10#| fig-height: 12set.seed(123123)pc =prcomp(dt %>%select(-id,-final) %>%mutate_all(scale))#create pairs plotggpairs(dt %>%select(-id),aes(color=ifelse(final>80,"pass","(almost)fail\n")),progress = F) +labs(caption ="Stratified by final scores: (almost)fail is <=80 and pass is >80")```There appears to be pretty distinct separation in the distribution in the performance of students who scored less than 80% on the final vs. students who exceeded that threshold. Looking at the rightmost column, there is strong correlation (\>0.702) between scores on the final exam and other metrics.It is likely wise to stick with precedent of 70% being a pass threshold, since the distributions of all other evaluations are even more disparate as shown by the density plots below:```{r}#| fig-width: 15#| fig-height: 10#| warning: falseplots <-list()for(i in1:7){ plot <-ggplot(dt, aes_string(x=names(dt[,i+1]))) +geom_density(aes(color=ifelse(final>70, "pass", "fail"))) +theme(legend.position="none") plots[[i]] <- plot}plot_grid(plots[[1]], plots[[2]], plots[[3]], plots[[4]], plots[[5]], plots[[6]], plots[[7]], ncol =4)```However, there isn't as much separation for GA - perhaps one student did relatively better in Gross Anatomy, but their other scores during and outside of the course are much below their classmates.```{r}labels <-c("student a", "student b", "class mean")summary <-rbind(dt[which(dt$final<70), 2:9], colMeans(dt[,2:9]))summary2 <-cbind(labels, summary)summary2 %>%mutate(across(where(is.numeric), ~round(., digits =0))) %>%kable(.)```If we stratify by who got a 90 on the final, for example, there is much more overlap for the distributions - so we aren't separating out who is not performing well enough overall with this stringent of a cutoff.```{r}plots <-list()for(i in1:7){ plot <-ggplot(dt, aes_string(x=names(dt[,i+1]))) +geom_density(aes(color=ifelse(final>90, "pass", "fail"))) +theme(legend.position="none") plots[[i]] <- plot}plot_grid(plots[[1]], plots[[2]], plots[[3]], plots[[4]], plots[[5]], plots[[6]], plots[[7]], ncol =4)```### PCAWe did further analysis with K means clustering with 4 groups(A,B,C,F) in an effort to identify particular groups of students classified based on their scores as an additional way to justify a separation threshold for failing. But as we can see, we failed to identify any useful subgroup in the data. But we can see that there are potential outliers in the student in the lower performance, we just need an evaluation method that help us capture these students with low performance```{r}#| warning: false#k means clusterscl =kmeans(dt %>%select(-id) %>%mutate_all(scale),centers =4)$clusterdt %>%left_join(tibble(id = dt$id,cluster =as.factor(cl))) %>%cbind(pc$x) %>%ggplot(aes(x=PC1,y=final,color=cluster)) +scale_color_calc()+geom_jitter()autoplot(pc,color =as.factor(cl))```## Can we create a better metric?What if we take a weighted average to calculate the overall score where 0.4 comes from the NBME exam,since it is independent from the instructor's coursework but correlated with a student success, and 0.6 is a combination of:- quiz scores- gross anatomy- slide quiz scores- Part C- essay scores- the end of block examand set the students in the lowest 5% quantile to fail?```{r}#| warning: false#| fig-width: 10#| fig-height: 10overall =0.6*rowMeans(dt %>%select(-id,-final,-nbme)) +0.4*dt$nbmedt %>%select(-id) %>%mutate(overall = overall) %>%ggpairs(.,aes(color =ifelse( overall>quantile(overall,0.05),"pass","fail")),progress = F)#graph pcadt %>%mutate(overall = overall,pass = overall>quantile(overall,0.05)) %>%cbind(pc$x) %>%ggplot(aes(y=PC2,x=PC1,color=pass))+geom_jitter()```Not like the original design which has a clear cut-off, this metric however can better reflect student overall performance in all fields, those who has a lower score tends to perform worse in most of other fields```{r}#| include: false#| fig-width: 10#| fig-height: 10# # overall = scale(pc$x)[,1:2] %*% c(-0.8,0.2) # # dt %>% select(-id) %>% # mutate(overall = as.numeric(overall)) %>% # ggpairs(.,# aes(color = ifelse(# overall> quantile(overall,0.05),# "pass","fail")),# progress = F)# # dt %>% # mutate(overall = overall,# pass = overall>quantile(overall,0.05)) %>% # cbind(pc$x) %>% # ggplot(aes(x=PC1,y=PC2,color=pass))+# geom_jitter()```